Support for Qwen-Image, Qwen-Image-Edit, and Qwen-Image-Edit-Plus by Danamir · Pull Request #2072 · Acly/krita-ai-diffusion

Danamir · 2025-10-05T03:01:40Z

Basic support of Qwen-Image models, including Nunchaku SVDQ quantized versions.

I did not touch the models and node auto-installation part as I'm not familiar with it. For now you can try this PR as long as you have a Qwen-Image model downloaded (normal, gguf, or svdq) and a recent ComfyUI and ComfyUI-nunchaku versions.

You can also load a Lightning LoRA from https://huggingface.co/lightx2v/Qwen-Image-Lightning .

When using Lightning versions I recommend creating specific presets with minimum_steps set to 1. For example :

    "ER SDE - BETA (lightning)": {
        "sampler": "er_sde",
        "scheduler": "beta",
        "steps": 4,
        "cfg": 1.0,
        "minimum_steps": 1
    }

NB : I'll try to add Qwen-Image-Edit support, and maybe Qwen-Image-Edit-Plus (ie. 2509) basic support with only one layer. For this later one it will heavily limit the possibilities offered by the model, but I'll rather have a separated PR for the UI modifications needed to handle multiple edit sources.

Danamir · 2025-10-05T03:03:35Z

This should fix #1939, #2032, #2066.

Danamir · 2025-10-05T03:52:54Z

Basic support for Qwen-Image-Edit done, with TextEncodeQwenImageEdit (not the Plus version). I purposefully left the VAE input empty to force the use of ReferenceLatent as it fixes the unzoom edit bug.

Danamir · 2025-10-05T09:40:18Z

I fixed in the code the settings that works for my GPU for NunchakuQwenImageDiTLoader, but it should be left to the user, maybe in the performance settings ?

model = w.nunchaku_load_qwen_diffusion_model(
    model_info.filename,
    cpu_offload="enable",
    num_blocks_on_gpu=16,
    use_pin_memory="disable",
)

Acly · 2025-10-05T09:53:28Z

I got an error here if I don't override checkpoint resolution in the style. Probably the resolution range can be flexible like Flux, although I notice the Edit model doesn't follow instructions properly when resolution isn't ~1MP

Apart from that, it works well!

Basic support for Qwen-Image-Edit done, with TextEncodeQwenImageEdit (not the Plus version).

Is there any downside from using the Plus version?

but I'll rather have a separated PR for the UI modifications needed to handle multiple edit sources.

You just create some "Reference" control layers for more image sources. The UI side of this already works and multiple image inputs are handled in apply_edit_conditioning by stitching them together for Flux Kontext. It probably works for Qwen too, but can be improved maybe with the TextEncodeQwenImageEditPlus multi-input node.

I purposefully left the VAE input empty to force the use of ReferenceLatent as it fixes the unzoom edit bug.

It's still unclear to me if this is really a bug, or an issue with input resolutions. Or what exactly the difference is between passing the image to the encode node vs. using ReferenceLatent...

I fixed in the code the settings that works for my GPU for NunchakuQwenImageDiTLoader, but it should be left to the user, maybe in the performance settings ?

I don't think this should be burden of the user, unless as a last resort. It's a bit annoying the nunchaku nodes don't figure this out themselves, they have the most information. For the Flux loader there was a heuristic at least.

Danamir · 2025-10-05T10:08:51Z

I got an error here if I don't override checkpoint resolution in the style. Probably the resolution range can be flexible like Flux, although I notice the Edit model doesn't follow instructions properly when resolution isn't ~1MP

You're right, I always had a custom resolution in my styles. I'll add a valid resolution range.

Is there any downside from using the Plus version?

Sadly the Plus version is pretty bad at transforming the styles (ie. to pixel art, drawing, etc. ). It is better at everything else though. And the 1Mp transform seems to be handled directly by the text encode Plus node.

Right now I was in the way of adding a new architecture Arch.qwen_e_p and a propertie Arch.is_qwen_like . 😅 It may be an overkill...

You just create some "Reference" control layers for more image sources.

I just saw that when copying the flux code, I'll use that for TextEncodeQwenImageEditPlus it should work great ! I'll have to find a way to limit the additional sources to 3.

It's still unclear to me if this is really a bug, or an issue with input resolutions.

I think it's a little bit of both, but there is no real downside to using the ReferenceLatent, it gives different results but those are rarely worse.

Concerning Qwen-Image-Edit-Plus, I find it even more prone to zooming out, and ReferenceLatent can't be used. There were some formulas on reddit to try to find the internal resizing of TextEncodeQwenImageEditPlus but it was really strange, like multiple of 16 but with an offset of 7-10 to the top left...

I don't think this should be burden of the user, unless as a last resort.

We always have the possibility to simply detect the quantity of VRAM, and < 16GB add cpu_offloading=enable, num_blocks_on_gpu=1, use_pin_memory=disable, otherwise disable cpu offloading.

The num_blocks_on_gpu can be left to 1, the performance stays the same, the only change is that you keep some of your system RAM available since more of the model is left on the GPU.

Acly · 2025-10-05T09:56:19Z

ai_diffusion/client.py

+    def from_string(s: str, filename: str | None = None):
        if s == "svdq":
            return Quantization.svdq
+        elif filename and "qwen" in filename and "svdq" in filename:


Why was this needed? Was there a svdq file that wasn't detected?

I added some logs to find what was the quantization value, and nothing is returned in the model entity ! I had to resort to the filename trick...

I'll try some more debugging.

Ah you're right, the quant field wasn't set by the model detection

Fixed here: Acly/comfyui-tooling-nodes@f555efb

ai_diffusion/client.py

ai_diffusion/resources.py

ai_diffusion/ui/server.py

ai_diffusion/workflow.py

Acly · 2025-10-05T10:15:27Z

Right now I was in the way of adding a new architecture Arch.qwen_e_p and a propertie Arch.is_qwen_like . 😅 It may be an overkill...

Yea I think Arch should really be for different architecture or at least "model ecosystem". I'm actually not even sure if Flux Kontext and Qwen Edit deserve their own Arch (edit models are technically just finetunes), but there are lots of things that don't work for edit models so it makes some sense.

Danamir · 2025-10-05T12:21:43Z

In the meantime I added the Arch.qwen_e_p architecture support. We can always refactor it with a cleaner solution.

It works like a charm with a single image. I just have to solve the problem preventing Qwen-Edit to add references images since it has neither controlnets, nor ipadapter.

[edit] : found it, it should do the trick :

if self.mode.is_ip_adapter and models.arch in (Arch.flux_k, Arch.qwen_e_p):
    is_supported = True  # Reference images are merged into the conditioning context

…dit-Plus

Danamir · 2025-10-05T12:50:02Z

Here we go ! Qwen-Image-Edit-Plus support. It even works with selection, which makes it a pretty useful inpaint model.

Danamir · 2025-10-05T12:53:31Z

Still a little bit of a zoom problem (cropped head top), but still pretty good !

"Extract the man in image1, use the poses from image2. White background, concept art."

shadowlocked · 2025-10-05T15:48:02Z

This is fabulous work - have been using Qwen on ComfyUI for the first time, waiting for it to hit Krita Diffusion. Only thing I would say is to make sure that the (locked) new Qwen default doesn't have a Nunchaku dependency. After a recent update, I could no longer get the default locked Kontext setting to work, as I didn't have Nunchaku installed and I just got an error.

The problem there was that in the 'default' Kontext, 'Diffusion Architecture' was set to 'Automatic', and of course that cannot be changed by the user. 'Automatic' turned out to have Nunchaku requirements(!). So I just copied the locked default and changed the Diffusion Architecture to 'Flux Kontext'.

Danamir · 2025-10-05T15:57:23Z

make sure that the (locked) new Qwen default doesn't have a Nunchaku dependency.

The PR has been tested with GGUF and Nunchaku versions. I do not have a non-quantized qwen version, neither do I have a ComfyUI installation without the nunchaku package.

Feel free to give it a try, theoretically there should be no problem without it.

Danamir · 2025-10-05T16:47:53Z

Here is the code of the internal resize in ComfyUI's TextEncodeQwenImageEditPlus :

samples = image.movedim(-1, 1)
total = int(384 * 384)

scale_by = math.sqrt(total / (samples.shape[3] * samples.shape[2]))
width = round(samples.shape[3] * scale_by)
height = round(samples.shape[2] * scale_by)

s = comfy.utils.common_upscale(samples, width, height, "area", "disabled")
images_vl.append(s.movedim(1, -1))
if vae is not None:
    total = int(1024 * 1024)
    scale_by = math.sqrt(total / (samples.shape[3] * samples.shape[2]))
    width = round(samples.shape[3] * scale_by / 8.0) * 8
    height = round(samples.shape[2] * scale_by / 8.0) * 8

    s = comfy.utils.common_upscale(samples, width, height, "area", "disabled")
    ref_latents.append(vae.encode(s.movedim(1, -1)[:, :, :, :3]))

It looks like a simple fixed ratio resize to 1048576 total pixels (ie. 1024x1024), then rounded to a multiple of 8. There is a strange stuff with movedim(-1, 1) but I'm not familiar with this method.

Acly · 2025-10-05T17:34:38Z

It looks like a simple fixed ratio resize to 1048576 total pixels (ie. 1024x1024), then rounded to a multiple of 8. There is a strange stuff with movedim(-1, 1) but I'm not familiar with this method.

Yea. It resizes all images to (384*384) total pixels while keeping aspect ratio and attaches that to the LLM prompt.

If VAE is provided it also resizes all images to (1024*1024) total pixels while keeping aspect ratio and attaches them to conditioning as if by using the Reference Latent node.

The movedim is just to convert between file/display image layout to inference memory layout.

The question is whether the automatic resize is good or bad. For Flux Kontext I omitted it. Although it follows prompts better when images stay around 1MP, it can really degrade quality. Probably it's similar here. If I understood you correctly, the internal resize makes the "zoom in" behavior more likely.

In any case we should pass the images, and don't need to pass the VAE, since we already handle ReferenceLatent anyway.

Danamir · 2025-10-05T18:20:01Z

In any case we should pass the images, and don't need to pass the VAE, since we already handle ReferenceLatent anyway.

The problem is that ReferenceLatent cannot be used reliably with multiple images like TextEncodeQwenImageEditPlus does. Every time I tried to combine those two nodes with mode than one image, I did not get a coherent edit. If you only stitch the images together, you get worse results.

I was thinking of a thing more along the line of resizing the expected latent output to the same size of the first internal resize to see if it helps with the drifting.

It's too bad the internal scale cannot be avoided. Maybe I should try some tests with a modified copy of the node...

[edit] : Note: I did not try to chain multiple ReferenceLatent nodes, each with one image. Performance where already greatly affected by only one node.

[edit2] : I inspected the code of ReferenceLatent, it seems to do the exact same thing as the code used in TextEncodeQwenImageEditPlus . It seems you're right, we should be able to ignore the VAE and chain the ReferenceLatent nodes. I'll do some more tests in a custom workflow.

Danamir · 2025-10-05T19:22:08Z

Yes ! By completely ignoring the resize we got a pixel-perfect edit, even with using a 1856x1440 and a 1536x1536 reference images. The performance were pretty bad because of the size, but the output is really good.

It seems to work even better if the reference latents are resized to have the same short side dimension.

I'll update the code.

…text_encode_qwen_image_edit

Danamir · 2025-10-06T11:30:44Z

Got a strange bug when using a Qwen-Image model with a linked Edit model in the style.

When I try to add a reference image in Generate mode, it rightly tells me that it's not supported :

But if I switch to Edit mode, the reference disappears, and clicking on the + button only add more erroneous references to the Generate side.

I'll try and see if I can figure this out, but it's not high on the priorities for the core functionalities.

Danamir · 2025-10-06T12:33:16Z

Using NunchakuQwenImageDiTLoader.cpu_offload="auto" might be enough for most usages. It automatically turns on the CPU offloading with < 16GB VRAM. The rest of the options are set to the node default values.

Acly · 2025-10-06T18:17:20Z

But if I switch to Edit mode, the reference disappears, and clicking on the + button only add more erroneous references to the Generate side.

Yea that's a bug, the buttons are always linked to the image model regions. I can fix it later.

Acly · 2025-10-06T18:19:20Z

Question: do we really need to support both edit model variants?

From what I understood, 2509 is an evolution and may be further improved. Even if it doesn't do everything 100% better I'd rather not support the old version if it's almost obsolete already, and will probably be more so in the future.

ai_diffusion/workflow.py

Acly · 2025-10-06T18:29:10Z

Another question: which model support LoRA? Do any of the Nunchaku quants support them?

I wonder if we need to disable the LoRA UI, or at least make sure there's an error of some kind which explains LoRA can't be used. Not sure what the timeline is for Nunchaku Qwen Lora support, it would be really nice to have (also to only have 1 model and a Lightning sampler which uses the Lora)

Danamir · 2025-10-06T18:32:50Z

Another question: which model support LoRA? Do any of the Nunchaku quants support them?

I wonder if we need to disable the LoRA UI, or at least make sure there's an error of some kind which explains LoRA can't be used. Not sure what the timeline is for Nunchaku Qwen Lora support, it would be really nice to have (also to only have 1 model and a Lightning sampler which uses the Lora)

I'm with you on this one, I really dislike having to use many models where a few LoRA would be enough.

The Nunchaku code to support LoRA loading is supposedly ready, cf. this comment : nunchaku-ai/ComfyUI-nunchaku#479 (comment) . I don't know how long it will take for it to be available in ComfyUI.

Danamir · 2025-10-06T18:41:57Z

Question: do we really need to support both edit model variants?

Sadly the 2509 variant is seriously lacking in the styling department. It's a joke how bad it is, not even usable for this purpose. That's why I decided to keep both the recent and older architectures for now, otherwise it would amputate a pretty nice feature.

As soon as they create a model capable of both, we can drop the older code ! I'm pretty sure they wont go back to the simple TextEncodeQwenImageEdit node in the future iterations, but who knows if they'll require the use a new node for each version...

I'd rather keep the backward compatibility for now, as long at it does not increase the technical debt too much.

ai_diffusion/control.py

…dit mode text encoding in the former

Danamir · 2025-10-07T13:32:14Z

I just saw on another computer that the Qwen svg icons necessitate a local font to be displayed correctly, I wrongly thought Inkskape would vectorize the text. 😅

I'll let you create better ones, those were only placeholders anyway.

Acly · 2025-10-09T10:32:58Z

I made a small follow-up here: #2076

Have a look if you want, I moved the qwen-text-encode a bit further to the same level as regular clip-text-encode, hope I didn't break something.

Thanks for your work!

shadowlocked · 2025-10-10T09:50:48Z

The PR has been tested with GGUF and Nunchaku versions. I do not have a non-quantized qwen version, neither do I have a ComfyUI installation without the nunchaku package.

I just found out that the same behavior occurs in a vanilla install for the Chroma profile - if the user does not have Nunchaku installed and prioritized, the 'default' behavior throws an error, presumably because your own system defaults to Nunchaku. The non-Nunchaku user has to duplicate the affected profile and know to change diffusion architecture from 'Automatic' to 'Chroma'.

Also the VAE choice has to be changed from 'automatic' (to, for instance, ae.safetensors).

I understand that you don't want to affect an installation that you clearly like, but for the sake of other users, could it not be forked into a version where Nunchaku is not the default? This would mean that the current issues for non-Nunchaku users trying to use Qwen and Chroma, etc. would not manifest.

Acly · 2025-10-10T11:03:27Z

@shadowlocked I think your problem has nothing to do with Nunchaku. Chroma doesn't have a Nunchaku version at all. The built-in Flux/Qwen styles are setup to look for a number of potential models, including Nunchaku and non-Nunchaku variants, and take whatever they find.

Maybe it's just that you put the model in the wrong folder? It should be in diffusion_models. Or you have multiple conflicting VAE models?

Danamir added 9 commits October 4, 2025 20:56

Added qwen and qwen_e Arch references

4fc9bc6

Added nunchaku_load_qwen_diffusion_model method

312ded2

Added more references to Qwen

98da072

Corrected model type expected when loading qwen-image model

7c05daa

Added qwen text encoder type

7424517

Use filename svdq quantization detection for qwen models

8eae9bd

Use nunchaku dit loader with qwen svdq quant

3f97aba

Corrected Arch.qwen_e detection

03fac7a

Temporary icons for Qwen-Image and Qwen-Image-Edit models

9256066

Danamir added 3 commits October 5, 2025 05:06

Added a default case in load checkpoint

5a2bc0d

filename variable check in Quantization.from_string

1d53e2f

Added support for Qwen-Image-Edit with TextEncodeQwenImageEdit node

99fa539

Lint format corrections

e305386

Lint format corrections

329b9ee

Acly reviewed Oct 5, 2025

View reviewed changes

Danamir added 3 commits October 5, 2025 12:25

Set default Qwen-Image resolutions

0cda113

Remove unnecessary code

231c239

Added support for Qwen-Image-Edit-Plus, added Arch.qwen_e_p

14064f6

Allow references images to be added in Qwen-Image-Edit & Qwen-Image-E…

e15de77

…dit-Plus

Danamir changed the title ~~Qwen image basic support~~ Support for Qwen-Image, Qwen-Image-Edit, and Qwen-Image-Edit-Plus Oct 5, 2025

Use multiple reference_latent in Arch.qwen_e_p instead of relying on …

f4cc49e

…text_encode_qwen_image_edit

Use auto cpu offloading in Nunchaku Qwen loader

9bdecda

Acly reviewed Oct 6, 2025

View reviewed changes

ai_diffusion/workflow.py Outdated Show resolved Hide resolved

Acly reviewed Oct 6, 2025

View reviewed changes

ai_diffusion/control.py Outdated Show resolved Hide resolved

Danamir added 7 commits October 6, 2025 21:39

Refactored encode_text_prompt and apply_edit_conditioning to do the E…

0f59bcc

…dit mode text encoding in the former

Typecheck fix.

a776ef4

Use arch.is_edit check

6c11874

Typecheck fix.

db824a0

Typecheck fix.

8075756

Typecheck fix.

028b764

Format fix.

ce27c8a

Acly approved these changes Oct 7, 2025

View reviewed changes

Acly merged commit ff40cf4 into Acly:main Oct 8, 2025
2 checks passed

v0xie mentioned this pull request Oct 11, 2025

Avoid replacing found text encoder models with None in GGUF model search #2084

Merged

Uh oh!

Conversation

Danamir commented Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Danamir commented Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Danamir commented Oct 5, 2025

Uh oh!

Danamir commented Oct 5, 2025

Uh oh!

Acly commented Oct 5, 2025

Uh oh!

Danamir commented Oct 5, 2025

Uh oh!

Acly Oct 5, 2025

Choose a reason for hiding this comment

Uh oh!

Danamir Oct 5, 2025

Choose a reason for hiding this comment

Uh oh!

Acly Oct 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Acly commented Oct 5, 2025

Uh oh!

Danamir commented Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Danamir commented Oct 5, 2025

Uh oh!

Danamir commented Oct 5, 2025

Uh oh!

shadowlocked commented Oct 5, 2025

Uh oh!

Danamir commented Oct 5, 2025

Uh oh!

Danamir commented Oct 5, 2025

Uh oh!

Acly commented Oct 5, 2025

Uh oh!

Danamir commented Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Danamir commented Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Danamir commented Oct 6, 2025

Uh oh!

Danamir commented Oct 6, 2025

Uh oh!

Acly commented Oct 6, 2025

Uh oh!

Acly commented Oct 6, 2025

Uh oh!

Uh oh!

Acly commented Oct 6, 2025

Uh oh!

Danamir commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Danamir commented Oct 6, 2025

Uh oh!

Uh oh!

Danamir commented Oct 7, 2025

Uh oh!

Uh oh!

Acly commented Oct 9, 2025

Uh oh!

shadowlocked commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Acly commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Danamir commented Oct 5, 2025 •

edited

Loading

Danamir commented Oct 5, 2025 •

edited

Loading

Danamir commented Oct 5, 2025 •

edited

Loading

Danamir commented Oct 5, 2025 •

edited

Loading

Danamir commented Oct 5, 2025 •

edited

Loading

Danamir commented Oct 6, 2025 •

edited

Loading

shadowlocked commented Oct 10, 2025 •

edited

Loading

Acly commented Oct 10, 2025 •

edited

Loading